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Abstract 

This paper derives practical algorithms, based on Bayesian 
inference methods, for several data analysis problems common in 
time series analysis of astronomical and other data. One problem is 
the determination of the lag between two time series, for which the 
cross- correlation function is a sufficient statistic. The second problem 
is the estimation of structure in a time series of measurements which 
are a weighted integral over a finite range of the independent variable. 

1. Workhorse Algorithms for Time Series Analysis 

Bayesian methods are becoming more popular for the challenging 
data analysis problems facing the modern astrophysicist, but 
the pace is agonizingly slow. I believe the main difficulty is the 
perception that Bayesian methods must be implemented in complex, 
special-purpose routines made for a single application and requiring 
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copious computational resources. Progress will be accelerated by 
the availability of turn-key algorithms for elementary data analysis 
problems. 

Larry Bretthorst has pioneered in developing Bayesian methods 
for the detection of periodic signals in noisy data (Bretthorst 1988, 
2001). He computes the posterior distribution of the frequency 
parameter in a model consisting of a single sinusoidal component, 
having marginalized the amplitude and phase parameters. It is 
encouraging that this work is making its way into a number of 
active areas in astronomy, including variable star research and more 
recently discovery of extra-solar planets. The present work applies 
the methods clearly outlined in (Bretthorst 1988) to another common 
astronomical problem - the detection of lags between two or more 
signals. 

2. Lags in Time Series Data 

In engineering and science, including both experimental and 
observational sciences, such as astronomy, one often wishes to find 
the delay between two time series. This somewhat complex mixture 
of questions includes: Are the two time series related? If they are, is 
one a delayed version of the other? If so, what is the best estimate of 
the value of the lag? 

2.1. The Model 

A straightforward approach is to define a generic model expressing 
one signal as a delayed and scaled version of the other, and then 
derive the posterior probability distribution of the parameters 
representing the lag and the scale factor. This procedure can be 
carried out making few assumptions about the signal, and none about 
signal shape. From this posterior one can easily compute means and 
confidence intervals for lags and scale factors. 
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Let X and Y denote two observables. Assume that the underlying 
process being sampled here is an unknown signal, S, superimposed 
on a constant background, B. If a negative signal is impossible for 
physical reasons an appropriate prior can impose the condition S >0. 

The backgrounds can be treated as unknown nuisance parameters, 
assigned a prior probability distribution, and marginalized. If the 
backgrounds are accurately fixed by other data, so the prior 
distribution is very narrow, one can sometimes get away with treating 
the backgrounds as known constants. 

The model of the observables, expressing delay and scaling 
between the two signals, is then: 



where m represents the independent variable, often time, the lag 
is r, and we allow the y-signal to be an overall factor a times the 
X-signal. Of course, a may be less than, equal to, or greater than 1. 

We now discuss two data modes common in astronomy, namely 
time-tagged events and evenly sampled time series with normal 
errors. The different nature of the observational errors in these two 
cases means that they are represented differently in the model, as will 
be seen in the next two sections. 



We begin by treating event data, sometimes called time-tagged 
event (TTE) data in the astronomical literature. Such data are 
simply the set of times at which events occurred - usually within 
a fixed interval, starting at time and ending at time T. Here we 
assume that the only observational noise is due to the randomness of 




+ By 



(1) 
(2) 



2.2. Time- Tagged Event (TTE) Data 
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the events. 

The times are not, of course, recorded with infinite precision. 
They are quantized in small units, here called time-ticks, defined by 
the computer clock that drives the data acquisition system. Setting 
the time-tick to unity, the event times are a set of integers satisfying 

< mi < 7712 < ms < . . . < tun-i < uin < T (3) 

where is the total number of events. Often the detection process 
mandates the condition rrii ^ m^+i indicated in Eq. Indeed, it is 

almost always the case that each event is followed by a short interval 
during which the instrument cannot detect any subsequent event. We 
ignore this detector dead time. 

It is useful to represent TTE data as a series of delta functions: 

{1 if event at time m 
m=l,2,...,M (4) 
if no event at time m 

where M is the total length of the observation interval in time-ticks, 
and X is the observed value of X (similarly for y and Y). 

As mentioned above, we assume that the only observational noise 
is that due to the randomness of the events. Equations and (@) 
give the probability of detecting an event during tick m. Typically 
the instrument is designed so that these probabilities X^,!^™ are 
<< 1. In this truncated Poisson process the probability of no X-event 
at time m is e^^™. Hence the likelihood is simply 



^E.g., photon detection, the most common astronomical apphcation, is inherently 
discrete due to the quantum nature of light. The physical parameter is the expected 
rate of photon detection, determined by the incident photon intensity and the 
instrument's detection efficiency. 

^ In a few cases - e.g. multiple detectors in a single spacecraft - this is not true. 
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Similarly for Y 

It is fundamental to this analysis that the and Y^ are all 
independent with respect to the measurement noise process. Hence 
the total likelihood is the product of the individual ones: 

M 

Ltotal = n L{xm\Sm, Bx)L{yjn\Sm-T, Br.a) (7) 

m—1 

Note that with these likelihoods there is an issue connected with 
wraparound that is essentially the same as with cross correlation 
functions of any kind. The expressions derived here assume that the 
data may be allowed to wraparound. 

It is more convenient to use the form 

Ym+T = + By , (8) 

equivalent to Eq. (0), to transform the expression for L in Eq. (|7|) to 

M 

Ltotal = l[ L{xjji\Sjji, Bx)L{yjn+T\Sm, BY,a) (9) 

m=l 

SO that we can write the total likelihood as the product of factors, 
each of which depends on the same signal variable, Sm'- 

M 

Ltotal = n Ljn(Sjn) (10) 
Tn=l 

where 

Lm{Sm) = L{Xm\Sm, Bx)L{y„i+r\Sm, By, o) (11) 

We can individually marginalize the signal parameters Sm-, which 
for the purposes of determining the lag and scale factor are nuisance 
parameters. Dropping the subscript on the now dummy varible 5^, 
and adopting a prior P{S), the marginalized posterior is 

/+00 
P{S)L{xm\S, Bx)L{ym+T\S, By, a)dS (12) 
-oo 
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That is to say, we have 

M 

Gtotai{r,a) = U G{xm,ym+T) (13) 

m=l 

or 

Gtotal{T,a) = G'o,0°^0,l^^l,o"^l,l^ (14) 

where 



and 



Nofl{r) = number of 


m for which 




= and 


Vm+T — 


(15) 


No At) = 




•^m 


= and 


Um+T 


1 (16) 


Ni,o{r) = 






= 1 and 


Vm+T 


(17) 


NiAr) = 




•^m 


= 1 and 


Vm+T 


1 (18) 


Ga,p = lP{S)L{Xm = 

a = 


a\S,Bx)L{ym+T = 
= 0,1; ^ = 0,1 


m Bx 


. a)dS 


(19) 



Note that the N^s depend on the data and on r, but not on a; the 
Cs depend on the model's form, priors on model parameters, and on 
a, but not on r. 

The N''s as functions of lag r are conveniently found from the 
cross-correlation function of X and Y, defined as 

M 

7x,y(r) = E Xm+rVm (20) 

m=l 

This function is readily and rapidly computed, using the fast Fourier 
transform, by representing X and Y as arrays of zeros punctuated by 
unit amplitude (^-functions at the m at which events occur. 

It can be shown that 



= 7x,y(r) (21) 

A^i,o = Nx- 7x,F(r) (22) 

A^o,i = Ny- 7x,y(r) (23) 

iVo,o= M - Nx - Ny + -fxA^) (24) 
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where M is the number of values of m spanning the observation 
interval [cf. Eq. (g)], and A^^: and Ny are just the number of events. 
Since the four combinations exhaust all possibilities, these quantities 
should, and obviously do, satisfy 

A^o,o + A^o,i + A^i,o + M,i = M (25) 



Adopting the uniform prior 



PiS) = ^^{h ^fJ'^^^^' (26) 



- ^0 I else 
the G's are easily found to be: 

Gi^i = 1 - Ex(l){So, Si) - EY(l){aSo, aSi) + ExEy(P{pSq, pSi) (27) 

Gi,o = EY(i){aSQ,aSi) - ExEY(t){pSo, pSi) (28) 

Go,i = Excf^iSo, Si) - ExEY(t>{pS^^ pSi) (29) 

^0,0= ExEY(t){pSQ,pSi) (30) 



where p = 1 + a. 



Ex = e-^- (31) 
Ey = e"^^ (32) 



and 



0(2^,2/) = (33) 

x-y 



It is instructive to take log of the likelihoods in Eq.([T^), as 
follows: 

logGtotai{r, a) = Nq^qIoqCq^ + Nq^iIoqGq^i + NifilogGifi + Ni^ilogGi^i 

(34) 

and Equations ([?T|-|2^) permit a representation in the form 

logGtotai{r, a) = co{a) + ci{a) jxxi^) (35) 

where 

co(a) = [M-Nx- NY]logGofi + NylogGo^i + NxlogGi^o (36) 
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and 

ci{a) = logGofi - logGo^i - logGifl + logGi^i (37) 

Accordingly, we have for the posterior probability density 

Gtotai{r, a) = e"°(")e"i(") ^^'^(^^ oc e'^^") ^^'^(^^ (38) 

Note the similarity of Eq. (|38| ) to an analogous result in harmonic 
analysis - detecting a sinusoidal signal in the presence of noise - giving 
the posterior probability density for the frequency u [|bretthorst i988| , 
Eq. (2.7)] 

P{u\D,aJ) oc (39) 

where C{u) is the periodogram, D represents the data, a is the 
variance of the noise, here assumed known, and / is the prior 
information. The cross correlation function, 7 is a sufficient statistic 
for lags, just as the periodogram is for frequencies ( [bretthorst i988|) . 



Note that the maximum likelihood value of the lag r is just the 
value that maximizes the cross-correlation function. The mean value 
of r, weighted by the posterior in Eq. ([38|), may be a better lag 



estimator. This posterior is also useful for computing confidence 
intervals. 



2.3. Evenly Spaced Data 

For simplicity, in this section we ignore the background 
component, and consider noisy measurements of two signals 

= Sjji^ + Rj^ (41) 

where m = 1,2, ... ,M are a set of evenly spaced times, and the true 
signals S^'^ are corrupted by additive noise, R^'^ . Consider the 
model 

stating that one signal is a delayed and scaled version of the other. 
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Assume that the noise has a normal distribution [see (|bretthorst 
T^H^) for relevant discussion], so that the likelihood for X at time m 



IS 



erf V27r 



Similarly for Y at time m 



which, from Eq. (^), becomes 



e 2(^^)2 



(43) 



(44) 



(45) 



With the usual independence assumption, the total hkelihood is 



1 M 

p _ r ^ \M 



■)"' n 



1 



X .2 



27r (7^(7^ 



e 2(0-^)2" 



That is 



where 



M 



2(cr, 



X x2 



Ptotai = i'o n e ^f'^™) 

m=l 



2(-T)^ 



1 M 1 

= )" n ^ 



(46) 



(47) 



(48) 



Shifting m ^ m + r in part of this expression gives an equivalent 
form in which factors involving the same 8^ are kept together: 



total 



M . 

^0 n e 

777=1 



X-,2 



2((T, 



XTT" 



M 

n e 

777=1 



Expanding the argument of the exponential: 



-^777 



2XjyiS^ -\- ((5*, 



Vm+T 



or 



2 



(49) 



(50) 



Y \2i m 



2' a. 



X\2 



m+T/ 



(51) 
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Rewrite this as 



where 



1. 



X\2 
m ) 



r = 

— 

-Bm — 

A = 



1 r X, 



m _|_ Vm+T 
X\2 ' 



X\2 



2 (C^m, 
1. 1 



+ 



+ 



Still following ( pretthorst i988| ) we complete the square: 



SO that the marginalizations of the 5*^ become 

M 



1 J— oo 



m=l 
m=l 

m=l 



oo 



TO 



TT 



-a 



\ ^TO 

M 

TO=1 



TT 



(52) 

(53) 
(54) 

(55) 

(56) 

(57) 
(58) 
(59) 
(60) 



It is instructive to make some simplifications. Assume the 
variances are constants, independent of m. (If this is not true, not 
much simplification is possible, but the general expressions are readily 
evaluated numerically.) Then 



M 1 2 



m—1 



y-M 2 
H / ^^No I — ^0 



Y\2 



2' {a^y 

is just a constant, independent of r (and a). Furthermore, 



(61) 



a 



(62) 
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is also constant as far as m and r are concerned, although it does 
depend on a. 



Hence 



P(r, a\D) = Po[^]^e-^^^oe^(^'«) (63) 
A[a) 



where 



E 7[7-yT9 + no ] (65) 



A{a) 4 ((7^)2 {al^.f 



where 



and 



4^(a)((T^)4 4^(a)((7^)2((T^)2 4yl(a)((T^)4 
= K^{a) + Ki{a)-ixA^) (67) 

y-M 2 o2v*^ 7/2 

2^(a)(cr^)2(cr^)2 (acr^)2 + (cri")2 ^ ^ 

are independent of m and r, and 

M 

7X,y(r) = E ^rnVm+T (70) 
m=l 

is the ordinary cross correlation function. Note that the cross term 
is the only one where the r dependence disappears due to the 
summation. Thus we can write 

PiT,a\D) = Po[-^]fe-^^^^°e^°(")e^^(«)^^^^(^) (71) 

74(a) 

again in the same form as in Eq. (|58|). If a is fixed, we have 

P{r,a\D) ~ e^i^'^)^^'^^^) (72) 

or 

logP{T, a\D) ^ K,{a)jxAr) - ^Jxyl^lry (^3) 

Eq. ( |7T1) can be used to compute various quantities related to the 
lag, the scale factor, and their variances. 
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Structure in Time Series 



We turn briefly to a different problem, namely estimating 
the signal itself. This section is an extension of the Bayesian 
Blocks ( [Scargle 199^ , [Scargle :^UUU|) method to the case where the 



measurements have a normal error distribution and refer to an 
extended range of the independent variable. 



Bayesian Blocks: Uneven Samples with Normal Errors 




Fig. 1. — Piecewise (block) representations. Dashed lines: true model. Points and 
error bars: the synthetic data. Solid line: Bayesian Block estimate. 



Figure 1 shows the block representation for a toy problem 
with just three blocks, and delta function spread functions for the 
independent variable. Note that the change point locations have been 
determined essentially exactly. 
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3.1. The Data 

The data consists of measurements of a function y{x), not actually 
confined to the single value of x but instead a weighted averaged over 
a range of a:-values 

Data = {yn, x^, (Tn, 'Wn{x),n = 1, 2, . . . , TV} (74) 

where cr„ is the known variance of measurement n, and Wn{x) is the 
weighting function, allowed to be different for each datum. 



3.2. The Model 

We assume the standard piece-wise constant model of the 
underlying signal, that is, a set of contiguous blocks: 

B{x) = E B^^\x) (75) 
i=i 

where each block is represented as a boxcar function: 

the Q are the changepoints, satisfying 

min{xn) < Ci < C2 < • • • < 0+1 < • • • < CiVb < max{xn) (77) 

and the Bj are the heights of the blocks. 

The value of the observed quantity, yn-, at under this model is 

yn = I Wn{x)B{x)dx 

= Iwn{x) j:fliB^^\x)dx 

= j:fllIWn{x)B^-^\x)dx (^^) 



so we can write 



yn=j:BjGjin) (79) 
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where ^ 

Gj{n) = / ^ Wn{x)dx (80) 

is the inner product of the n-th weight function with the support of 
the j-th block. The analysis in (|Bretthorst i988|) showns how do deal 
with the non-orthogonality that is generally the case here.Q 



3.3. The Posterior 



The averaging process in this data model induces dependence 
among the blocks. The likelihood, written as a product of likelihoods 
of the assumed independent data samples, is 

P(Data|Model) = n^^i P(2/„|Model) (81) 

= , ^e-i<-?')^ (82) 

= n^Li ^ ) (83) 



= ge-2( h-n ) , (84) 

where 

N I 

Q = n • (85) 

n=l V^TTCT^ 

After more algebra and adopting a new notation, symbolized by 

- - 2/n (86) 

and 

^ - G,{n) , (87) 



^If the weighting functions are delta functions, it is easy to see that Gj{n) is 
non-zero if and only if x„ lies in block j, and since the blocks do not overlap the 
product Gj{n)Gk{n) is zero for j ^ k, yielding orthogonality, J2n Gj{n)Gk{n) = 6j^k- 
And of course there can be some orthogonal blocks, for which there happens to be 
no "spill over", but these are exceptions. 
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we arrive at 

logPi{yn}\B) = Qe'^ , 
where 

N Nb N Nb Nb N 

H^T. yl-2 E E 2/nG,(n) + E E B,Bk E G,{n)Gk{n) . (89) 

n=l j=l n=l j—1 k=l n=\ 

The last two equations are equivalent to Eqs. (3.2) and (3.3) of 
( [bretthorst 1988| ), so that the orthogonalization of the basis functions 
and the final expressions follow exactly as in that reference. 



4. Conclusions and Future Work 

This paper has developed an algorithm for estimating time series 
lags, leading to a posterior that is the exponential of a scaled cross 
correlation function. In addition we developed an extension of the 
Bayesian Blocks algorithm to the case where not only are there errors 
in the dependent variable, but where the measurement is a weighted 
integral over a finite range of the independent variable. Work planned 
includes development of numerical algorithms, testing them on 
synthetic and real data, and then making them freely available in the 
form of Matlab programs. I also am working on a similar analysis of 
scaling behavior in time series, where the scalegram - the square of 
the wavelet coefficients, averaged over their location index - is the 
sufficient statistic. 
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